169 research outputs found

    Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics

    Full text link
    Logs have been widely adopted in software system development and maintenance because of the rich system runtime information they contain. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on intelligent log analytics powered by AI (artificial intelligence) techniques. However, only a small fraction of these techniques have reached successful deployment in industry because of the lack of public log datasets and necessary benchmarking upon them. To fill this significant gap between academia and industry and also facilitate more research on AI-powered log analytics, we have collected and organized loghub, a large collection of log datasets. In particular, loghub provides 17 real-world log datasets collected from a wide range of systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical log usage scenarios, and present a case study on anomaly detection to demonstrate how loghub facilitates the research and practice in this field. Up to the time of this paper writing, loghub datasets have been downloaded over 15,000 times by more than 380 organizations from both industry and academia.Comment: Dateset available at https://zenodo.org/record/322717

    ROME: Testing Image Captioning Systems via Recursive Object Melting

    Full text link
    Image captioning (IC) systems aim to generate a text description of the salient objects in an image. In recent years, IC systems have been increasingly integrated into our daily lives, such as assistance for visually-impaired people and description generation in Microsoft Powerpoint. However, even the cutting-edge IC systems (e.g., Microsoft Azure Cognitive Services) and algorithms (e.g., OFA) could produce erroneous captions, leading to incorrect captioning of important objects, misunderstanding, and threats to personal safety. The existing testing approaches either fail to handle the complex form of IC system output (i.e., sentences in natural language) or generate unnatural images as test cases. To address these problems, we introduce Recursive Object MElting (Rome), a novel metamorphic testing approach for validating IC systems. Different from existing approaches that generate test cases by inserting objects, which easily make the generated images unnatural, Rome melts (i.e., remove and inpaint) objects. Rome assumes that the object set in the caption of an image includes the object set in the caption of a generated image after object melting. Given an image, Rome can recursively remove its objects to generate different pairs of images. We use Rome to test one widely-adopted image captioning API and four state-of-the-art (SOTA) algorithms. The results show that the test cases generated by Rome look much more natural than the SOTA IC testing approach and they achieve comparable naturalness to the original images. Meanwhile, by generating test pairs using 226 seed images, Rome reports a total of 9,121 erroneous issues with high precision (86.47%-92.17%). In addition, we further utilize the test cases generated by Rome to retrain the Oscar, which improves its performance across multiple evaluation metrics.Comment: Accepted by ISSTA 202

    Cu2O@PNIPAM core–shell microgels as novel inkjet materials for the preparation of CuO hollow porous nanocubes gas sensing layers

    Get PDF
    There has been long-standing interest in developing metal oxide-based sensors with high sensitivity, selectivity, fast response and low material consumption. Here we report for the first time the utilization of Cu2O@PNIPAM core–shell microgels with a nanocube-shaped core structure for construction of novel CuO gas sensing layers. The hybrid microgels show significant improvement in colloidal stability as compared to native Cu2O nanocubes. Consequently, a homogeneous thin film of Cu2O@PNIPAM nanoparticles can be engineered in a quite low solid content (1.5 wt%) by inkjet printing of the dispersion at an optimized viscosity and surface tension. Most importantly, thermal treatment of the Cu2O@PNIPAM microgels forms porous CuO nanocubes, which show much faster response to relevant trace NO2 gases than sensors produced from bare Cu2O nanocubes. This outcome is due to the fact that the PNIPAM shell can successfully hinder the aggregation of CuO nanoparticles during pyrolysis, which enables full utilization of the sensor layers and better access of the gas to active sites. These results point out great potential of such an innovative system as gas sensors with low cost, fast response and high sensitivitH. J. gratefully acknowledges financial support of the CSC scholarship. S. P. acknowledges funding from the Community of Madrid under grant number 2016-T1/AMB-1695

    ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection

    Full text link
    Anomaly detection in multivariate time series data is of paramount importance for ensuring the efficient operation of large-scale systems across diverse domains. However, accurately detecting anomalies in such data poses significant challenges. Existing approaches, including forecasting and reconstruction-based methods, struggle to address these challenges effectively. To overcome these limitations, we propose a novel anomaly detection framework named ImDiffusion, which combines time series imputation and diffusion models to achieve accurate and robust anomaly detection. The imputation-based approach employed by ImDiffusion leverages the information from neighboring values in the time series, enabling precise modeling of temporal and inter-correlated dependencies, reducing uncertainty in the data, thereby enhancing the robustness of the anomaly detection process. ImDiffusion further leverages diffusion models as time series imputers to accurately capturing complex dependencies. We leverage the step-by-step denoised outputs generated during the inference process to serve as valuable signals for anomaly prediction, resulting in improved accuracy and robustness of the detection process. We evaluate the performance of ImDiffusion via extensive experiments on benchmark datasets. The results demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in terms of detection accuracy and timeliness. ImDiffusion is further integrated into the real production system in Microsoft and observe a remarkable 11.4% increase in detection F1 score compared to the legacy approach. To the best of our knowledge, ImDiffusion represents a pioneering approach that combines imputation-based techniques with time series anomaly detection, while introducing the novel use of diffusion models to the field.Comment: To appear in VLDB 2024.Code: https://github.com/17000cyh/IMDiffusion.gi
    • …
    corecore